منابع مشابه
Building Bilingual Dictionaries from Parallel Web Documents
In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a ...
متن کاملMining the Web for Bilingual Text
STRAND Resnik is a language independent system for automatic discovery of text in parallel translation on the World Wide Web This paper extends the prelim inary STRAND results by adding automatic language identi cation scaling up by orders of magnitude and formally evaluating perfor mance The most recent end product is an au tomatically acquired parallel corpus comprising English French documen...
متن کاملMining Domain Specific Words from Web Documents
Web pages provide not only plain text materials for training language models but also tag information for semantics annotation. The tags could be found either explicitly in the HTML documents or implicitly through the directory hierarchy of the documents, since the directory hierarchy can be regarded as a kind of classification tree for web documents, which assigns an implicit hidden tag to eac...
متن کاملMining Web Documents for Unintended Information Revelation
This research concerns web site information security. With an increasing number of documents being generated by different individuals and departments in organizations, there is a potential of releasing information which is inconsistent with the overall goals, objectives and operation of the organization. We refer to this as unintended information revelation (UIR). This paper focuses on progress...
متن کاملWeb Mining: Clustering Web Documents A Preliminary Review
Evidently there is a tremendous proliferation in the amount of information found today on the largest shared information source, the World Wide Web (or simply the Web). The process of finding relevant information on the web can be overwhelming. Even with the presence of today’s search engines that index the web it is hard to wade through the large number of returned documents in a response to a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2016
ISSN: 1877-0509
DOI: 10.1016/j.procs.2016.06.103